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PROBLEMS OF CLASSIFYING FAILURES OF MACHINES 
AND THEIR COMPONENTS 

by 

R. V. Kugel' 

Ya. B. Shor 

Because of the increased attention paid to the problem of reliability of 
machine-building products, to organization of tests and observation of the 
performance of products in operation, and also to the analysis of the results 
of observations by using the methods of the theory of reliability, there 
inevitably appears a series of problems pertaining to the nature of failures, 
their classification, recording, and analysis. Discussed in this paper are 
the fundamentals of these problems; for example, what is a failure, can any 
specific event be regarded a failure? How should the failures appearing 
during the operation of a complex multicomponent machine be classified? 

Do all failures deserve the same attention, accounting, and recording? Which 
of the failures should be used as a basis for estimating the faultless per- 
formance and the service life of a machine? 


A failure is generally known as a complete or partial loss by a product 
of its ability to perform; i. e. , it is a state at which the product corresponds 
at a given moment of time to all requirements established for all basic para- 
meters which characterize the normal execution of the specified functions. 1 
It means that, when the appearing defect affects only the minor parameters of 
the product without impairing its normal work, it should be regarded as a 
defect in the product, but not as a failure; in such a case, the faultiness of a 
machine, as a whole, can be also regarded as the failure of one of its 
components. 

Failures can be classified from different points of view, by different 
symptoms, and in accordance with the pattern shown in the table. According 
to the latter, a distinction should be first made between failures appearing 
under normal (where the rules for using and servicing of the machine are 
complied with) and under abnormal operating conditions. From now on, while 
considering only the failures appearing under normal operating conditions, for 
the failures appearing under abnormal conditions we will confine ourselves by 
remarking that all of them can be divided into failures caused by incorrect 

lM The Reliability of Technical Systems and Products, " Collection of 
Recommended Terms of the Committee on Scientific and Technical Terminology 
of the Academy of Sciences, USSR, Nauka (Science), No 67a, 1965. 
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Classifying Symptom 


Failure 


Originating conditions 

Causes of failures 

Possibilities of further 
use of product 

Character of change in 
product's parameters 

Presence of external 
manifestations 

Relationship between 
the failures of com- 
ponents of the 
product 

Consequences 
Method of eliminating 


Difficulty of eliminating 
Frequency of failures 

Possibilities of 
predicting 

Origin 

Possibilities of avoiding 
the causes of failures 


Appeared under normal or 
under abnormal conditions 

Not caused by destruction; 
due to destruction 

Entirely possible; only 
partly possible 

Sudden; gradual 

Obvious (explicit); 
hidden (implicit) 

Independent; dependent 


Dangerous; safe; heavy; 
light 

By replacing the part, by 
adjusting, cleaning; 
self-eliminating 

Simple; complicated 

Once; systematically 
repeated 

Unpredictable; predictable 
by age (accrued operating 
time) or by parameter 

Design; technology; due to 
operation 

Can be avoided; cannot be 
avoided 






handling of the attending personnel (violation of the established sequence of 
operations, use of inadmissible loads, work under unfavorable climatic condi- 
tions, etc.) and failures caused by elemental upheavals (icing, snowstorms, 
bad roads, etc.). 

In accordance with the causes for their appearance, failures of mecha- 
nical systems are divided into those not connected with a destruction of the 
system's components and failures caused by their destruction. For this, let 
us agree to call a "destruction" the final result of a process leading to a loss 
of ability to perform (wear, chipping, breakages, corrosion, aging, etc.). 

As an example of failures not connected with destruction can be cited the 
choking of the fuel supply system of transporting vehicles, leakage from 
built-up hoses, bracings weakened by vibrations, formation of carbon deposits 
on pistons, dirt or weakened contacts of electrical wires, etc. However, the 
number of possible failures of this type is rather small compared to the list 
of failures caused by some kind of a destruction. 

Irrespective of their causes, any form of disruption of ability to perform 
on a machine must be investigated in order to avoid the appearance in the 
future. However, the main attention should be paid to failures caused by a 
destruction of parts of machines and, therefore, dependent on their service 
life. 


The uninterrupted performances of mechanical systems are determined 
predominantly by their ability to withstand destruction. For example, the 
most bulky of the complex machines, the automobile, when designed correctly, 
made in accordance with modern requirements, and attended normally, 
represents practically an uninterrupted operating product until the appearance 
of gradually repeating failures caused by the destruction of its components. 

A complete failure, cited in the table of classifications, means that 
the product cannot be used for its purpose until the failure is eliminated; a 
partial failure makes the partial use of the product possible. Thus, a forced 
stoppage of a motor caused by some impairment of its reliability is a complete 
failure, while its performance with reduced power or with an excessive use of 
lubricants is a partial failure. 

A sudden failure appears unexpectedly as a result of a jumpwise change 
in the value of one or of several of the basic parameters of the product, i. e. , 
as a result of a sudden complete loss by some of its components of their 
ability to operate; a gradual failure results from a gradual change in the 
values of the parameters of the product with an increasing loss of the efficiency 
of its components. Typical examples of gradual failures are those which 
result from the wear of joints. These appear gradually and lead to a gradually 
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increasing clearance between parts and distort the correct shape of the con- 
tacting surfaces; upon reaching its ultimate wear, the joint becomes unfit. 

Such an event is usually preceded by the appearance of direct or indirect 
symptoms (knocks, larger free-play, more friction, etc.) which make it 
possible to predict the failure. Particular attention should be given to the 
widespread case of gradual destruction of parts caused by fatigue which, how- 
ever, results not in a gradual, but in a sudden failure. It takes a long time, 
even years, for a damage due to fatigue to accumulate in a part until a crack 
(invisible or unnoticeable in many cases) makes its appearance; a further 
increase of the crack results in a sudden breakage of the part. 

An obvious (explicit) failure is clearly displayed as soon as it appears or 
soon thereafter, while a hidden (implicit) failure may remain unnoticed for a 
long time. Thus, poor lubrication may not be detected until it causes a 
damage; a leakage in the cooling system may remain unnoticed until the unit 
becomes overheated; an instrument may furnish for a long time incorrect 
readings which are believed to be reliable. 

Proceeding to failures classified by the relationship between the com- 
ponents, let us agree that an independent failure is one caused by any reason, 
except by another failure, while a dependent failure is one resulting from 
another failure. For example, a chip of a broken tooth of one gear may 
damage several parts of the transmission; a break in the crown of a valve 
will cause a break in the piston, bend the connecting rod, and score the 
cylinder of the engine; stoppage of lubrication caused by the failure of the 
oil pump will damage the rubbing surface of different units of any mechanism, 
etc. In analyzing the reliability of a product, predominant attention should 
be paid to independent (primary, so to speak) failures which are the primary 
causes of depriving products of their ability to operate. In many cases, 
however, the danger of the appearance of a chain of interconnected failures 
testifies to the vulnerability of the design, as a whole, and should be taken 
into consideration when estimating its reliability. 

The classification of failures by consequences is reduced to dividing 
failures, first, into dangerous and safe, and secondly, into heavy and light. 
Failures threatening the life and health of people are considered dangerous, 
while those which are free of this menace are regarded as safe; failures 
resulting in heavy losses belong to the class of heavy failures; a failure is 
light when it causes no serious loss. The unavoidable vagueness of this 
division and the possibility of subjective estimates should be noted. In fact, 
if, say, a diesel-generator is out of order which leaves an entire region 
without electric power, it will indisputedly belong among heavy failures; but 
the consequences of a stoppage, for example, of a tractor during the rush of 
agricultural works which also results in an obvious, though lesser, damage 
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may be evaluated differently. Since the division of failures of specific products 
into light and heavy is the job of specialists of respective enterprises or 
branches of industry, it is hardly advisable to demand from them an estimate 
to cover the entire country. It is obvious that an estimate should be recom- 
mended from the standpoint of a given enterprise and be based on the peculiar- 
ities of its products and of their operation, but when used for a branch of 
industry, it is desirable to assure a unity of judgement. 

Note that the consequences of a failure of a machine depend at times on 
the circumstances at which it is used. For example, a forced stoppage of a 
truck during usual working conditions is, by its consequences, a safe failure, 
but it is a dangerous failure when the stoppage occurs during the winter in the 
polar regions. For machines whose work is of particular importance tech- 
nically and economically during certain seasons, a failure considered as a 
light failure during the period between the season may prove to be a heavy 
failure during the heat of the season. 

In accordance with the method of their elimination, failures are divided 
into those that can be eliminated by replacing the faulty part (this includes all 
cases of failures caused by destruction of parts) ; those that can be eliminated 
by adjusting, tuning, and tightening ( failures caused by maladjustment of 
devices, weakened bracing, etc. ) ; failures that can be eliminated by cleaning, 
shaking, air-blowing (failures caused by oily contacts, formation of carbon 
deposits, plugged pipelines, etc. ) ; also into failures that can eliminate them- 
selves (cutoffs) . Among the latter belongs, for example, an automatic dis- 
connection of a unit effected by its overheating caused by heavy overloading, 
brief plugging of pipeline supplying fuel or oil, etc. 

In accordance with the difficulty of their elimination, failures are 
divided into simple and complicated. Among the first belong the failures that 
can be easily eliminated and require no prolonged and difficult repairs; among 
the second group belong failures that are difficult to eliminate. In connection 
with the subjective character of defining the difficulty of eliminating failures, 
definitions applying specifically to certain types of machines are desirable. 

For work of tractors and cars, for example, it was proposed to consider a 
failure to be simple when it can be eliminated with the aid of the driver's set 
of tools. However, when the elimination of the failure requires the use of 
other tools, accessories, spare parts, and also the aid of a repair crew 
requiring the return of the vehicle to its base (or to call for technical help) , 
then the failure is complicated. The difference between simple and complicated 
failures is especially pronounced when the machine works far away from its 
base. Under such conditions a complicated failure may cause prolonged 
idleness, and, correspondingly, large losses. 
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According to the frequency of their occurrence, a distinction is made 
between singular or episodic failures resulting from random circumstances 
and not because they are natural and regular for products of a given model, 
and systematically repeating failures which, in contrast to singular failures, 
result from causes intrinsic to a given model, which are due to the design 
and its production technology. 

The importance of predicting failures is obvious: a failure that can 
be predicted makes it possible to reduce or to eliminate its consequences; 
in many cases, steps taken in due time can prevent even the appearance of 
such a failure. Certain failures are, unfortunately, unpredictable; i, e. , 
they cannot be foreseen in advance; among them belong, in particular, the 
random failures whose distribution follows the exponential rule. However, 
the majority of failures , including those caused by destruction, can be pre- 
dicted, provided their law of distribution is known. In other words , the 
knowledge of the regularities of distribution of failures is the key to the 
ability to predict them. 

Predictable failures are divided into failures whose appearance 
depends definitely on the age of products or on the accrued time of their 
operation (for example, those that occur as a result of wear, aging, or 
accumulated fatigue) , and failures resulting from a change in some para- 
meter of the product. 


The first case includes the failures of unrestored products, for which 
the curve of wear and tear, p(t) , is known (Figure 1) . Here, p(t) indicates 
the probability of no failure taking place within a time interval ranging from 
0 to t. Similarly, p(t + r) is the probability of no failure within a time 
interval ranging from 0 to t + t. Then, from the theorem of multiplication 
of probabilities, we find (provided no failures have taken place prior to 
moment t) that the probability of absence of failures during the interval t’ 
will be p(t + r)/p(t) . Hence, the probability of a failure within the interval 
t will be 



P(t + T) 

P(t) 


( 1 ) 
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This equation enables us to predict failures taking place within the time 
interval r , when the accrued operating time t prior to this interval is known 
(provided no failures have occurred prior to moment t) . Assume, for example, 
that the accrued operating time prior to the failure of an unrestored product 
is distributed normally with an average accrued time of 1000 hours and a 
deviation a = 300 hours. At t = 100 hours, according to equation (1), 
the value of Q (in percent) as a function of the accrued operating time t 
will be as follows 2 : 


t 

200 

500 

2000 

Q 

T 

0.6 

5 

75 


The data indicate that, if a product operated for a duration of 200 hours, 
the occurrence of a failure during the subsequent 100 hours is hardly probable; 
however, after 2000 hours of operation the appearance of a failure during the 
subsequent 100 hours is certain. Therefore, it is expedient, as a preventive 
step, to replace the product not later than after it operated for 2000 hours. 

In this case the distribution of the time prior to the failure follows the 
exponential rule: 

p(t) = e Xt , (2) 


where A is the intensiveness of the failures. 


From equations ( 1) and ( 2) we obtain for this case 


Q = 

T 


1 


e - Mt - T) 

-At 

e 



(3) 


i. e. , Q T is independent of the accrued time t, which indicates that it is 
impossible to predict failures by basing the prediction on the accrued 
operating time. 

The forecasting of failures based on a parameter can be applied, for 
example, for automatic metalworking machine tools and dimension-checking 
facilities. The most important parameter of a machine tool or of a device 
is the precision with which a given operation is executed under certain working 

2 Shor, Ya. B. , "Statistical Methods of Analysis and Control of 
Quality and Reliability, " Sovetskoye Radio (Soviet Radio) , 1962. 
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conditions. An index of precision that does not meet the established limit 
indicates a lost ability to operate, i. e. , a failure. Such a failure can be 
predicted by the value of the parameter of precision, keeping in mind that 
this value varies with the increase in accrued operating time and its approach 
to the limit of tolerance enable us to judge a coming failure. 

In many cases, failures caused by destruction (wear) can be considered 
to belong among failures whose prediction is based on a parameter. The 
coming of a failure can be judged by watching the wear of a part. 

In analyzing and preparing steps for elimination of failures, it is 
important to determine their origin (defects in design, production technology, 
or in operations) and then to divide the failures into those that can and cannot 
be eliminated ( correctable and irreparable) . Among the correctable failures 
belong those connected with the design and technology and are more often 
disclosed during the "working-in" and finishing of new products. The failures 
due to design are caused by defects in the latter and can be eliminated by a 
better design; failures due to technology are caused by defects in production 
technology or by defective technological control and can be eliminated by 
improving the technology and inspection methods in production. 

The causes of correctable failures can be eliminated entirely or 
neutralized to a degree that will make the appearance of a failure possible 
only after the amortization date of the service life of the machine. For 
example, artificial aging of housing parts eliminates their deformation 
caused by internal stresses; the use of stainless steel or of polymer sub- 
stances prevents their destruction by corrosion; the use of modern methods 
of designing, production, assembly, and arresting of threaded products can 
practically eliminate the failures caused by weakened threaded joints or by 
unscrewing of parts; many machine parts can be so designed and made that 
no repairs will be needed during their work. 

The irreparable failures include those for which a method of their 
elimination has not yet been found, or whose elimination requires an economi- 
cally unjustified cost. For irreparable failures, the problem is reduced to 
finding means that will assure their recovery in due time or that will reduce 
their consequences. 

The division of failures by their origin is often connected with difficulties 
and is not always indisputable. For example, the breakage of a transmission 
gear may be caused by a defective assembly of the unit or by the heat treat- 
ment of the gears; however, it does not eliminate the possibility that it was 
caused by an inadequate margin of strength. The cause of a broken crankshaft 
of an engine may be due to the inadequate margin of strength, but it may also 


8 



be due to an excessively unbalanced mass of the flywheel, couplings, and of 
other parts connected to the crankshaft; it is also possible that the strength 
of the crankshaft is reduced by some technological concentrator of stresses 
or by residual stresses appearing during the trueing of the crankshaft between 
operations. A rapid wear of some joint may turn out to be the result of 
defective design or technology, but it also may be due to the violation of the 
rules for lubrication by the attneding personnel. For this reason, the deter- 
mination of the causes of failures in complicated cases requires a thorough 
study of all possible sources. 

This classification of failures, while covering most of the items used 
in machine-building practice, is not yet exhausted. In certain special cases 
and fields it may prove to be advisable to separate other classifying symptoms 
or to use a more detailed classification. For example, I. N. Velichkin sug- 
gested that the failures of tractor engines should be divided in accordance with 
the difficulty of their elimination into correctable without replacement of parts 
and without dismantling the engine; failures whose elimination will require 
the displacement of parts without even a partial dismantling of the engine 
(i. e. , without removing parts which open the inside space of the engine) , 
partial dismantling of the engine by opening its inside space with none of its 
pistons removed, overhauling of the engine, major repair of the engine; and 
failures leading to discarding the engine. 

When used, cases sometimes happen in which a machine is unexpectedly 
unable to perform any operation or it performs it with interruptions. For 
example, a transporting vehicle begins to skid on a bad road, a tractor meets 
increased resistance of the soil and fails to plow to the needed depth, and units 
of an agricultural harvesting machine become packed with straw. Such events 
can be regarded as failures only when a given model is especially designed for 
a given operation performed under exactly such conditions. 

Mention should be made of defects in products which do not impair its 
ability to perform, i. e. , a product meets the established requirements, 
inasmuch as the basic parameters are concerned, but does not satisfy the 
parameters of secondary importance (such as, for example, the external 
appearance, convenience of operation, etc. ) . As it follows from the definition 
of this concept, such violations of fitness are not failures; there are, however, 
exceptional cases in which an unfit item is not allowed to work by the existing 
rules; i. e. , essentially, it is made unworkable. For example, a defective 
control instrument showing the state of some important unit of a machine, or 
the course of a process performed by it, may not affect the workability of the 
machine, but a continuous work of the machine may cause a breakdown or 
considerable damages and should be discontinued. 
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Should all failures and defects be registered? In our opinion, all of 
them. This, however, can be accomplished only by tests and operations 
carried out and observed by a special personnel; in ordinary operations, the 
information arriving from consumers does not in the majority of cases contain 
data on failures and defects of secondary importance. 

Let us consider the sequence of the work on a most difficult case of 
systematization and processing of complete information on a variety of failures 
in a large group of complex, multi-component machines ( of the same model) . 

The work should be started by selecting the groups which are expedient 
in this case for classifying the failures; this will depend on the purpose and 
features of the product and also on the purpose of the investigation. 

For a study of failures of mass-produced machines (autos, tractors) 
it is desirable to use at least five groups of failures classified by the reasons 
for their appearance, relationship between the failures of components, con- 
sequences, difficulty and possibility of eliminating the causes of the appear- 
ance of failures (groups 2, 6, 7, 9, and 13) . Classification of the failures 
by the frequency of their occurrence and by the possibility of their prediction 
can be easily accomplished at the concluding stage of the intestigation by using 
the results obtained by processing the information. 

Next, after separating the failures from defects (which are to be 
analyzed separately) , it is necessary to start the by-component selection of 
the failures and for this it is recommended that all basic information on each 
of these failures be recorded on separate cards (in particular, the name of 
the component that failed to perform, the name and character of the failure, 
the accrued operating time prior to the failure, the circumstances of its 
appearance and consequences of the failure, list of work and of the time spent 
for its elimination, list and cost of replaced parts, and the numbers of the 
groups of the employed classification of failures) . For a large volume of 
information and with a suitable form of recording (and using a code) , the 
information recorded on the cards can be processed by a machine. 

The data and the entire analysis of reliability must be processed by 
proceeding from particular to common (general) by following the "part- 
assembly-unit-machine" pattern. The result will be a list of failures for each 
part, for each assembly and unit. Arranging all cases in an order of increasing 
accrued operating time prior to the failure, the respective statistical charac- 
teristics are calculated for each failure. 3 Next are plotted illustrative curves 

3 Shor, Ya. B. , and Kugel’, R. V., Indicators of Reliability of Indus- 
trial Products ("Pokazateli nadezhnoisti promyshlehnykh izdeliy") , Collection 
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of the distribution of the accrued operating time prior to the failure (the accrued 
time is laid off on the abscissa and the frequency of failures on the ordinate) 
and the curves of the decreases (with the accrued operating time prior to the 
failure laid off on the abscissa and the percent of products continuing to work 
without failures laid off on the ordinate) . The ordinate of the curves of 
decreases indicates the probability of uninterrupted work for the selected 
accrued operating time. 

For unrestorable machine components, which are replaced immediately 
after the first failure, the characteristics of uninterrupted performance obtained 
in this manner coincide with the characteristics of long service life. For com- 
ponents restored during the operation, these characteristics do not coincide; 
therefore the service life of such components must be characterized separately 
and be based only on the information about those failures which were caused by 
some process of destruction (for this, the second group of classifications listed 
should be used) . 

The processed data of a fairly large group of machines provide a com- 
plete statistical picture about the failure-free feature and the service life of 
the components of a machine model investigated under specific testing and 
operating conditions (only statistical estimates must be used because the 
attempts, which are frequently used in practice to estimate the reliability, 
when based on the data obtained from a small group of machines, will provide 
neither complete nor reliable results) . 

The reliability of a machine cannot be characterized by a simple sum- 
mation of all observed failures of its components. The results of analyzing 
failures in accordance with the proposed method make it possible in the 
majority of cases to divide the failures into essential and unessential (an 
unessential failure is one which is quickly detected and quickly eliminated) . 
Unessential failures can be occasionally excluded from the overall estimate 
of the reliability of a complex machine; also frequently excluded are the 
failures disclosed and eliminated by the preventive inspections provided by 
the instructions of the manufacturing plant; other limitations depending upon 
the special features and the purpose of the machine are also possible. 

Therefore, an analysis of failures must cover several subdivisions in 
order to obtain a fairly complete estimate of the reliability. Only in doing so 
will it be possible to obtain information on relationships between various 
failures, for example, between essential and unessential, dangerous and safe. 


of Articles on Standardization and Improvement of Quality of Industrial Products, 
Izdatel'stvo Standartov (Publishers of Standards) , 1965. 
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and simple and complicated; it also makes it possible to calculate the pre- 
failure accrued time, for example, per any failure, per essential failure, and 
per dangerous failure; also to analyze the regularity of distribution of failures 
to obtain the prerequisites for a thorough analysis of causes and consequences 
of failures, etc. To make it short, the obtained picture is by far more com- 
plete than the one obtained by simply recording the failures without a detailed 
classification. 

In conclusion, worthy of attention is the great practical importance of 
accounting and analyzing machine failures while a machine is operating. Pre- 
cise information on failures enables us to estimate the ability of machines to 
operate without failures; information on failures caused by destruction of 
parts enables us to estimate the service life of machines and to know exactly 
the rate of spending spare parts; information on difficulty, duration, and 
cost of technical service and repairs needed to prevent and eliminate failures 
enables us to estimate the repairability of machines. The sum total of this 
entire information provides an objective picture of the reliability of such 
machines and makes it possible to find the means of improving it. 
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